BigDFT.BioQM module

A module to define typical operations that can be done on biological systems

load(archive, serialization_version=None, options={})[source]

Create a Biosystem from a serialization archive. It extracts the archive files in a temporary directory and loads the related information. Also it deletes the directory once the load is finished.

Parameters:
  • archive (path) – the path of the serialization archive

  • serialization_version (str) – version of the archive. Latest if not specified.

Returns:

Instance of the class, loaded from the archive

Return type:

BioSystem

serialize_biosystem(archive, posinp, logfile, version=None, **kwargs)[source]

Serialize the biosystem associated to the provided files

Parameters:
  • archive (path) – name of the archive to be employed for dumping

  • posinp (path) – pdbfile of the structure

  • logfile (path) – logfile of the structure

  • version (str) – version of the serialization

  • kwargs (**) – Arguments to be passed to BioSystem instantiation

class BioSystemSerialization(biosys, version=None)[source]

Serialization of the BioSystem class that would prevent the need to copy the entire matrices of the QM calculations

Parameters:
  • biosys (BioSystem) – a instance of the Biosystem class

  • version (str) – the version of the desired serialization. If absent, the default is considered.

version = '1.1'

version of the serialization

cached_attributes = {'1.0': {'cached_refragment': '_cached_refragment', 'fragindices': 'frag_indices', 'pairwise_BO': 'bond_orders', 'purities': 'purities'}, '1.1': {'interactions': 'interactions'}, '1.2': {'fragment_quantities': '_fragment_quantities'}, '2.0': {'atomic_BO': 'atomic_bond_orders', 'atomic_interactions': 'atomic_interactions', 'atomic_purities': '_atomic_purities', 'cached_refragment': '_cached_refragment'}}
dump(archive)[source]

Create an archive with the entire set of information of the Serialization. Such a tarfile should be such that the same analysis of the BioSystem is possible

class Sequence(seq)[source]

Initialize and define a sequence, may be amminoacidic or nitrogenous.

Parameters:

seq (Bio.Seq.Sequence) – the starting sequence

display(colors=None, labels=None, axs=None, boxcount=None, remove_letters=[])[source]

Display the sequence in a easy-to-read form

Parameters:
  • boxcount (int) – the number of boxes per row

  • remove_letters (list) – list of letter positions that will be removed

pdb_sequences(filename)[source]

Sequences of the chains of the proteins, in the order of the residue_list

Parameters:

filename (str) – protein file, in pdb format

Returns:

strings of the protein sequence in the FASTA sequence order

Return type:

list

read_structure(filename, name='protein')[source]

Extract a biological structure from a PDB file. Use the package Bio.PDB

Parameters:
  • filename (str) – path to the file of the structure

  • name (str) – the name of the structure

Returns:

(Bio.PDB.Structure) A structure instance

residue_list(structure, system, strict=True)[source]

Identify the residues of the sequence in the fragment of the system

Parameters:
  • structure (structure) – A structure read from the PDB parser of Bio.PDB

  • system (BigDFT.Systems.System) – a system coming from BigDFT package

  • strict (bool) – define if the identification has to be performend via centroids control or just by the residue name

Returns:

tuple of list the fragments of the system in the order of the

residues of the structure. Some of the fragments may be missing as they may have been merged of not associated to elements of the sequence. The second list is the list of the ordered residues.

Return type:

tuple

sequences_to_residues(reslist, chains)[source]

Identify for each element of the chains which are the residues that may be associated to them.

Parameters:
  • reslist (list) – the list of all the residues of the system

  • chains (list) – a list containing the sequences of the system

Returns:

a list of length of the chains indicating the lookup array for the

sequence

Return type:

list

class Graph(fragkeys, bond_orders, frag_labels, cutoff=0.01, restrict_to=None)[source]

A Graph representation of a System or a subportion of it.

Identify the connectivity of the system given a threshold, possibly restricted to a subset of fragments :param cutoff: the threshold of the cumulative fragment bond order

that is applied to define the connectivity

Parameters:

restrict_to (list) – list of the fragment id to which calculate the connectivity

display(colors=None, axs=None, edge_labels=None, node_shapes=None, edge_colors=None, node_edge_colors=None, colorbar_mappable=None, colorbar_label='', **kwargs)[source]

Draw the graph in the Kamada Kawai representation. Employ colors if present.

Parameters:
  • colors (list) – the list of the colors that have to be employed for all the graph nodes. The list has the size of the full keys list. Only target nodes are employed.

  • colorbar_mappable (matplotlib.colors.ScalarMappable) – mappable to be employed to draw a colorbar associated to the node colors.

  • colorbar_label (str) – label to be associated to the colorbar

  • edge_colors (list) – the list of the colors that have to be employed for each of the edges of the node

  • axs (matplotlib.axis) – the axis to draw on (optional)

  • edge_labels (dict) – a dictionary from tuples of node labels to a given label.

  • node_shapes (list) – list indicating the shapes of the nodes, in order of full keys. Only target nodes are used for the shape.

  • node_edge_colors (list) – the list of the colors that have to be employed for all the graph node borders. Same specs than the colors argument.

  • kwargs (dict) – any other argument you wish to pass to networkx. Those arguments are arranged in keys, among which common, nodes, edges, labels, edge_labels. Each values of the keys correspond to the arguments which should be passed to the corresponding draw_networkx_<> method.

construct_frag_tuple(frag)[source]

Build the fragment tuple given the fragment name.

Parameters:

frag (str) – the fragment name

Returns:

the (chain, residue, id) tuple name

Return type:

namedtuple

construct_res_tuple(res)[source]

Build the BigDFT fragment tuple given the residue of the structure

Parameters:

res (Residue) – A residue Class on the Biopython package

Returns:

the (chain, residue, id) tuple name

Return type:

namedtuple

residue_name(res)[source]

Name of the residue in the fragment specification of PyBigDFT

Parameters:

res (Residue) – A residue Class on the Biopython package

Returns:

name of the fragment

Return type:

str

biosystem_fragmentation(sys, structure)[source]

Refragment the system names according to the residues name of the structure from the biopython module.

Parameters:
  • sys (System) – and instance of the BioSystem class, fragmented in a

  • way (traditional) –

  • structure (Bio.Structure) – a structure coming from the BioPython module

Returns:

a system with a fragmentation that is dependent of the

chain of the molecules

Return type:

System

sequences_to_fragments(chains_to_residues, frag_to_residues)[source]

Defines the lookup array of the sequences to the fragments

Parameters:
  • chains_to_residues (list) – lookup of the chains to the list of recogized residues

  • frag_to_residues (list) – name of the associated fragments in the order of residues list

Returns:

array of the name of the fragments in the same representation of

chain_to_residues

Return type:

list

class BioSystem(filename, sys=None, structure=None, sequence_from_fragments=False, disable_warnings=False, use_native_fragmentation=False, atomic_granularity=True, **kwargs)[source]

Initialize a System for a Biological analysis. Retrieve, from the initial file, the basic fragments, the system sequence, their mutual interactions and the informations about the purity and the observables

Parameters:
  • filename (str) – a pdb file with fragment information inside

  • use_native_fragmentation (bool) – When true, the fragmentation is inferred from the definition in the pdbfile instead than from the Bio.Structure instance. Useful when dealing with non-biological objects, or with malformed biofragments.

  • sys (System) – an initial system, useful for a custom refragmentation. If present, overrides the association written in the pdbfile. It is user’s responsibility to ensure coherency of the total number of atoms.

  • sequence_from_fragments (bool) – method to determine the sequences. If True, the sequence are determined by the fragment names. It is user’s responsibility to provide a inputfile that has a set of residues compatible with the fragmentation.

  • disable_warnings (bool) – disable the BioPython warnings if true

  • structure (Bio.Structure) – a system structure that is meant to replace the one generated by the file parser. Useful for corrupted pdb files. To be used with sequence from_fragments

  • atomic_granularity (bool) – When True, employ atoms as the main units to define the interaction quantities. When False, fragments are employed. Useful for very large systems where atomic matrices may not be necessary, or too demanding.

  • **kwargs – other arguments to be passed to py:func:~BigDFT.IO.read_pdb

classmethod from_sys(system, attributes={}, **kwargs)[source]

Create a BioSystem instance from a previously existing system.

Parameters:
  • system (System) – the system instance.

  • attributes (dict) – attributes of the Biosystem that will be set from the user.

  • **kwargs – other arguments that have to be passed to the BioSystem instantiation.

Returns:

a BioSystem instance

Return type:

BioSystem

classmethod from_logfile(logfile, **kwargs)[source]

Create a BioSystem instance from the information in a logfile.

Parameters:
  • logfile (str) – path of the logfile.

  • **kwargs – other arguments that have to be passed to the BioSystem instantiation.

Returns:

a System population instance

Return type:

BioSystem

classmethod from_archive(archive, **kwargs)[source]

Create a BioSystem instance from the information in a serialized archive.

Parameters:
  • archive (str) – path of the serialized biosystem.

  • **kwargs – other arguments that have to be passed to the BioSystem load function.

Returns:

a System population instance

Return type:

BioSystem

set_archive_name(archive)[source]

Original archive name.

Set the name of the original archive from which this biosystem comes from.

Parameters:

archive (str) – archive name

set_qm_run(log)[source]

Associate the system to a QM run that has been performed by BigDFT. Such run assumes that the positions of the system atoms are consistent with the PDB

Parameters:

log (str) – A path to the Logfile of the BigDFT run

refragment(cutoff=0.05, view=None, groups=[])[source]

Identify the system’s fragmentation and represent it in terms of the systems’ residues. Employs the user defined cutoff to find the correct fragmentation

Parameters:
  • cutoff (float) – cutoff employed for the refragmentation.

  • view (dict) – the view to be imposed as a starting fragmentation in the system.

  • groups (list) – list of groups of the fragments view that will be employed as an initial guess for the fragmentation.

chessboards(cutoffs, initial_view=None, initial_groups=[])[source]

Define a coherent group of fragmentations that are provided from the loosest to tightest cutoffs. The views of the fragmentations are defined such as the tighest cutoffs always contain regrouping of fragments from the loosest

Parameters:
  • cutoffs (list) – list of floating point numbers of the desired cutoffs values.

  • initial_view (dict) – mapping of the initial fragmentation that will be imposed starting from the loosest cutoff.

  • initial_groups (list) – groups of the initial view which will be first considered for fragmentation.

Returns:

dictionary of the view. the values of the dictionary are

length-1 lists such that subsequent views (for instance found when instantiating a population) can be appended

Return type:

dict

property fragment_purities

Define the quality of the fragmentation that is provided in the system

set_fragment_quantities(name, data)[source]

Set fragment quantities to the system.

Include in the biosystem a set of fragment quantities that are associated to the fragments. Such quantities can then be retrieved by the fragment_values method.

Parameters:
  • name (str) – the name of the quantities to invoke

  • data (list, dict) –

    if a list, the quantities in order of fragment_names, otherwise a dictionary containing the

    per-fragment information

fragment_distances(target_fragments)[source]

Calculate the distance between each of the fragment and a list of target fragments.

Parameters:

target_fragments (list) – Id of the fragments to focus on

Returns:

list of the distances in the order of fragment_names

Return type:

array-like

fragment_interaction_and_feedback(target, criteria, environment=None, view=None)[source]

Provides interaction with a target and its feedback.

This method calculates the fragment interactions between a target and completes the result with the feedback of such interaction that an environmental region provides on such target. :param target: Id of the fragments to focus on :type target: list :param criteria: may be ‘hamiltonian’, ‘electrostatic’,

or ‘bond_order’

Parameters:
  • environment (list) – environmental region, separated from the target. If absent, the complementary region of the target is considered.

  • view (dict) – fragmentation view of the system. If present, target should be defined in the view.

Returns:

list of the strengths of the interactions of the

fragments wrt to the target fragments. The order of the list is provided in terms of the fragment_names. The values of target fragments are associated to the feedback interaction with the environment

Return type:

array-like

identify_contact_region(target, cutoff_bo=0.001, cutoff_el=10000000000.0, view=None, environment=None)[source]

Identify the contact with a target.

Single out the portion of the target fragments that interacts with the rest of the system.

Parameters:
  • target (list) – fragments of the target region.

  • cutoff_bo (float) – the value of the Fragment Bond Order above which the fragments are considered to be interacting. Typical values range between 0.01 and 1.e-3.

  • cutoff_el (float) – cutoff for the electrostatic value (Ha). Fragments whose interaction is larger than such value are also included in the contact region.

  • view (dict) – the mapping of the superunits into fragments.

  • environment (list) – environmental region under which restrict the sum.

Returns:

the list of the contact region as well as the value of the

bond order

Return type:

tuple

calculate_contact_regions(target, cutoff_bo=0.001, cutoff_el=10000000000.0, name='', view=None, environment=None)[source]

Identify the contact with a target.

Single out the portion of the target fragments that interacts with the rest of the system, and store it.

Parameters:
  • target (list) – fragments of the target region.

  • cutoff_bo (float) – the value of the Fragment Bond Order above which the fragments are considered to be interacting. Typical values range between 0.01 and 1.e-3.

  • cutoff_el (float) – cutoff for the electrostatic value (kcal/mol). Fragments whose interaction is larger than such value are also included in the contact region.

  • name (str) – Name of the region to be stored in the fragment_values.

  • view (dict) – mapping of the superunits into fragments.

  • environment (list) – the contact region is only calculated taking into accout those fragments.

extract_target_contact_and_counter_region(name='', environment=None)[source]

Extract the contact region in the target and counter-region.

Parameters:
  • name (str) – name of the stored region to extract

  • environment (list) – fragment belonging to the environmental region. The entire complementary region will be considered if absent.

calculate_target_interactions(target, name='', include_dispersion=False, nthreads=1, environment=None, view=None)[source]

Calculate the interaction terms with a target.

This function stores the fragment quantities associated to the interaction term with a target.

Parameters:
  • target (list) – list of the fragments of the target region

  • name (str) – Name of the region to be stored in the fragment values.

  • include_dispersion (bool) – if true, calculate also the dispersion interaction of the target with the environment. The dispersion is only calculated in the contact fragments. Therefore if this value it true it is assumeed that the contact regions have already been calculated and stored.

  • nthreads (int) – number of threads for the parallel evaluation of the dispersion

  • environment (list) – fragments which correspond to the environment region to be considered for the target. If absent, the complementary region will be considered.

  • view (dict) – mapping of the superunits into fragments

dispersion_interactions(region1, region2, nthreads=1, view=None)[source]

Van der Waals interaction strengths.

Extract the dictionary of the dispersion interactions Between two disjoint regions.

Parameters:
  • region1 (list) – list of fragments of region 1

  • region2 (list) – list of fragments of region 2

  • nthreads (int) – number of omp threads to perform the calculation.

  • view (dict) – mapping of the superunits into fragments.

fragment_interaction_strengths(target_fragments, criteria='hamiltonian', view=None)[source]

Interactions between the different fragments of the system, possibly specified on target regions.

Parameters:
  • target_fragments (list) – Id of the fragments to focus on. If a view is provided it should be expressed in terms of the fragments of the view.

  • criteria (str) – may be ‘hamiltonian’, ‘electrostatic’, or ‘bond_order’.

  • view (dict) – view of the fragmentation to be employed to evaluate the strength.

Returns:

list of the strengths of the interactions of the

fragments wrt to the target fragments. If a view is provided the data of each fragment is replicated for each of the superunits belonging to the fragment.

Return type:

array-like

d3PBE_energy(subsystem=None, erase_d3_files=True)[source]

Full dispersion energy of the system, or of a subsystem.

Parameters:
  • subsystem (list) – fragment to focus on. If absent, the entire system is considered.

  • erase_d3_files (bool) – If true remove the d3 files once the calculation is completed.

Returns:

the d3 PBE energy for the PBE functional, in Hartree.

Return type:

float

system_target_interaction_investigation(target, nthreads, environment=None, **kwargs)[source]

Identify the different interactions of the system with a target. :param target: the list of the fragments defining the target. :type target: list :param nthreads: number of threads to calculate the vdW correction. :type nthreads: int :param environment: fragments which will be considered in the

environmental region. If absent the complementary region of the target will be considered.

Parameters:

**kwargs – arguments of calculate_contact_regions

display_sequences(axs=None, labeldict=None, boxcount=None, with_fragment_labels=False, remove_letters=None, **kwargs)[source]

Display the full sequencies of the system. Employ the given field to define the color dict if available

Parameters:
  • field_vals (list) – values of the field to decide the colors of the keys

  • labeldict (dict) – an optional dictionary mapping fragment ids to strings that will be printed on each sequence square.

  • colordict (dict) – the dictionary of the keys, and the corresponding colors. Overrides field_vals if present

  • axs (list) – list of the axis in which to plot the sequences

  • with_fragment_labels (bool) – a boolean useful to represent the fragment numbers. Can be combined with a view to visualize the QM fragments. Provides a default labeldict.

  • remove_letters (list) – list of the letters to be removed from the display. In elements of the chains.

  • **kwargs – arguments to be passed to display of the colordict

Returns:

list of the axis of the sequences

Return type:

list

display_graph(restrict_to=None, bo_cutoff=0.01, fragment_labels=None, fragment_shapes=None, ax=None, colorbar_label='', edge_labels=None, bond_orders=None, edge_colordict_kw=None, node_edge_colordict_kw=None, display_kwargs={}, **kwargs)[source]

Display a graph view of the system.

With this routine the fragments of the system are represented as a Graph, connected by the Fragment Bond Order criteria. Graph nodes can be represented as with a color map that is provided in a similar way than the sequence display.

Parameters:
  • restrict_to (list) – list of fragments to which the graph has to be restricted.

  • bo_cutoff (float) – the limit for a non-negligible chemical link

  • fragment_labels (list, dict) – list (in fragment_names order) of the labels of the fragments or dictionary of the fragments which need relabelling.

  • fragment_shapes (dict) – per-fragment dictionary of the shapes to be employed in the graph drawing. Only non-default values are indicated. If absent, the nodes which belong to the restrict_to list will be represented as squares, and the others as circles.

  • edge_labels (dict) – dictionary of dictionaries: labels to be written next to the edges, with keys indicating the fragment names.

  • ax (matplotlib.pyplot.axes) – matplotlib axis

  • colorbar_label (str) – label to be associated to the colorbar. If None, no colorbar is drawn.

  • bond_orders (dict-like) – dictionary of the bond orders. Supersedes the systems’ bond order if present. Not compatible with a view.

  • edge_colordict_kw (dict) – dictionary of the keyword arguments of the edge colors. The same syntax of the colordict method is allowed, written in terms of the tuple of the fragments. An additional keyword argument, colordict_special, is also allowed. This one overrides the arguments provided by the colordict for the fragment tuples provided.

  • node_edge_colordict_kw (dict) – dictionary of the keyword arguments of the node border colors. The same syntax of the colordict method is allowed.

  • display_kwargs (dict) – dictionary of the keyword arguments to be passed to Graph.display method.

  • **kwargs – colordict arguments.

Returns:

the axes of the displayed graph

Return type:

matplotlib.pyplot.axes

refragmentation_colordict(refrag_keys, **kwargs)[source]

Define a colordict that associates the fragment which go together with the same color and put to white all the previously pure fragments.

Parameters:
  • refrag_keys (list) – Keys of the refragmented system

  • **kwargs – keyword arguments of the initial colordict

Returns:

the dictionary of the colors (equal color means same fragment)

Return type:

dict

colordict(color_by=None, keys=None, field_vals=None, colordict=None, colorcode=None, highlight=None, view=None, vmin=None, vmax=None, highlight_color='red')[source]

Define the colordict to quantify the fragments according to the method.

Parameters:
  • keys (list) – keys of the fragment names.

  • color_by (str) – can be ‘charge’, ‘dipole’, ‘purity’.

  • field_vals (list) – values of the field to decide the colors of the keys.

  • colordict (dict) – the dictionary of the keys, and the corresponding colors. Overrides field_vals if present

  • colorcode (str) –

    the string of the colorcode.

    It represents the colormap of matplotlib. Default is ‘seismic’ for diverging values (i.e. field_dict has negative data), otherwise ‘Reds’.

    highlight (list)color in red only the fragments which

    are indicated in the list.

    view (dict): the fragmentation of the system to aggregate

    the field_vals to.

    highlight_color (matplotlib.colors): color to be employed

    for highlighting.

fragment_label(frag, fmt, shift={})[source]

String that can be used for labelling a fragment.

Parameters:
  • shift (dict) – dictionary of the shifts to be applied to each chain from the first residue ID, with keys corresponding to the ID of the chain to be shifted. Useful for sequence alignment.

  • frag (str) – the fragment name.

  • fmt (list) – list indicating the characters that concatenate the residue attributes. The list should indicate “ch”, “res”, and “num” as the chain, residue name and number, respectively. “letter” can also be provided in case this is available.

Returns:

the label string.

Return type:

str

labeldict(labels={}, shift={}, mappable=None, fmt=['num'], view=None, joinchar='+')[source]

Label the fragments with a recognizable string.

This function returns a dictionary to label the fragments of the system in a way that can be employed in plots or graphs.

Parameters:
  • shift (dict) – dictionary of the shifts to be applied to each chain from the first residue ID, with keys corresponding to the ID of the chain to be shifted. Useful for sequence alignment.

  • mappable (func) – function that returns a string from a system’s fragment name.

  • labels (dict) – dictionary of explicit labels to be given to each fragment. Overrides all the other options for the indicated fragments.

  • view (dict) – a System fragments’ view. If present, fragment names are concatenated unless specified otherwise in mappable and labels.

  • joinchar (str) – character to be used to join labels in case of a view.

  • fmt (list) – list indicating the characters that concatenate the residue attributes. The list should indicate “ch”, “res”, and “num” as the chain, residue name and number, respectively. “letter” can also be provided in case this is available.

Returns:

A dictionary of the considered fragments with

the corresponding names as strings.

Return type:

dict

fragment_values(criteria=None, view=None)[source]

Retrieve the fragment values that are associated to a given criteria. This is a commodity function made to make more intentional some data.

Parameters:
  • criteria (str) – can can be ‘charge’, ‘dipole’, ‘purity’

  • view (dict) – the dictionary of the fragment view. If present, aggregate the values accordingly.

Returns:

in order of the fragment names.

Return type:

list

fragment_scatterplot(errors=None, ax=None, **kwargs)[source]

Defines the scatterplot to have a look to in order to interpret sequence data

Parameters:
  • errors (list) – the errors of the field_vals in the order of fragment_names

  • ax (matplotlib.pyplot.Axes) – Axis of the matplotlib instance

  • **kwargs – the same arguments of the :py:func:colordict method

Returns:

reference to the axis employed for the plot

Return type:

matplotlib.pyplot.Axes

represent(errors=None, with3d=False, with_fragment_labels=False, boxcount=None, remove_letters=None, labeldict=None, **kwargs)[source]

Represent all the data of the systems, both in with the sequence and with the scatterplot. Useful only when there is data to be represented.

Parameters:
  • with3d (bool) – if True represent the 3d vision of the system

  • errors (list) – the errors of the field_vals in the order of fragment_names

  • **kwargs – the same arguments of the py:meth:colordict method

Returns:

list fo the axis of the plot

Return type:

list

unmatched_frag_info()[source]

Retrieve the information about the fragments which have not been matched inside the sequence

Returns:

a Dictionary of the fragment that have not been recognized

Return type:

dict

charge_at_pH(ph, closed_shell=True)[source]

Identify the charge that the system should have for a given pH. Employ the ~py:meth:Bio.Bio.SeqUtils.ProtParam.ProteinAnalysis.charge_at_ph method of BioPython

Parameters:
  • ph (float) – the value of the ph

  • closed_shell (bool) – defines if the charge should be rounded to the nearest even integer considering the total number of electrons of the system.

Returns:

the value of the charge

Return type:

float

charge_by_protonation()[source]

Identify the system’s charge by looking at the protonation state.

The protonation state of each amminoacid is only identified by the total number of atoms of the residue, except for histidine. It is the user’s responsibility to check that the atoms are placed correctly.

Warning

Histidines have to be titrated wit the correct atom names. Hydrogens which are badly named in the delta1 and epsilon2 position may provide incorrect charge.

Returns:

charge of the system, list of residues for which the

charge cannot be determined.

Return type:

tuple

override_fragment_association(fragment_names, new_names=None)[source]

Force the association of the fragment provided by the fragment_names list into the data represented by the new_names. New names should be provided as ‘Ch-RES:num’ where Ch is the chain, RES is the aminoacid residue at the position num.

Parameters:
  • fragments_names (list) – list of the fragments to be reassociated. Should be present in the unmatched_fragment list.

  • new_names (list) – list of the new names to be produced, in fragment_names element order. If absent, the fragment names are employed.

to_archive(archive, version='1.2')[source]

Create the archive from which the System can be loaded again.

Parameters:
  • version (str) – version of the archive.

  • archive (str) – path of the archive to be created.

residue_distribution(criteria=None, field_vals=None)[source]

Create a dictionary of the distribution of field_vals per fragment.

Parameters:
  • field_vals (array-like) – list of the values of the distribution to analyze in element orders.

  • criteria (str) – can can be ‘charge’, ‘dipole’, ‘purity’

Returns:

dictionary representing the data of each residue name.

Return type:

dict

filter_field_vals(field_vals, lookup, fill_by=None)[source]

Select the data in the field_vals array by the lookup.

Performs a copy if lookup is not None

Parameters:
  • field_vals (array-like) – the data to be filtered:

  • lookup (array_like) – the list of the indices to be selected. If none the entire array is returned.

  • fill_by (float) – the value to be employed if the index provided in the lookup array is None. Defaults to numpy.nan.

Returns:

list of the data filtered by lookup array.

performs a copy if the lookup array is not None.

Return type:

array-like

graph_bond(fraglist, threshold, pairwise_bo, restrict_to=None)[source]

Defines the connectivity matrix of the fragments

Parameters:
  • fraglist (list) – the list of the fragment names

  • threshold (float) – the threshold of the cumulative fragment bond order that is applied to define the connectivity

  • pairwise_bo (dict) – list of the fragment BO connections

  • restrict_to (list) – list of fragments to which the graph has to be restricted

Returns:

connectivity matrix of the graph

Return type:

(matrix)

get_BO_network(connectivity_matrix)[source]

Defines the connectivity matrix of the fragments :param connectivity_matrix: matrix of the connnectivities of

the system

Parameters:

pairwise_bo (dict) – list of the fragment BO connections

Returns:

graph,target_nodes,target_edges.

Graph to be plotted and the list of target nodes and edges identified by the restriction if present.

Return type:

tuple

fragment_letters(fragnames, sequences, residue_list, sequences_to_residues)[source]

Give a list of the fragments in terms of their letter

Parameters:
  • fragnames (list) – list of the names of the fragments

  • residue_list (list) – The fragments of the system in the order of the residues of the structure

  • residues (sequences_to) – strings of the protein sequence in the FASTA sequence order

  • sequences (list) – Sequences of the system

Returns:

the labels of the fragments as letters. the label is set to ‘X’

if the letter is not recognized

Return type:

list

residue_letter(ires, sequences, sequences_to_residues)[source]

Find the letter associated with the residue

Parameters:
  • ires (int) – id of the residue

  • sequences (list) – Sequences of the system

  • residues (sequences_to) – strings of the protein sequence in the FASTA sequence order

Returns:

letter of the residue

Return type:

str

interaction_strengths(fragments, target_fragments, pairwise_bo)[source]

Define the interaction strengths of the fragment list with the others with respect to a set of target fragments.

Parameters:
  • fragments (list) – the name of the fragments of the system in pairwise_bo.

  • target_fragments (list) – the fragments constituting the target region.

  • pairwise_bo (dict) – the bond order between fragments.

Returns:

the sum of the weigths of the interactions of the fragments

in the target region.

Return type:

array

find_relevant_fragments(sys, data, criteria, columns=None)[source]

Identify the fragments of the system that fulfill the criteria provided by the function in argument

Parameters:
  • sys (BigDFT.System) – the System to identify

  • data (array, Dataframe, dict) – data to take care of. If ana array, it is considered in order of fragment_names. If it is a dataframe or a dict, the name of fragments are the keys.

  • criteria (func) – function that can be applied to the data items. The function is applied to each of the data elements and the criteria is assumed to be satisfied if any of the element fulfill the criteria

  • columns (list) – the fragment to focus the search to. Useful in the case of a dictionary

Returns:

fragments that fulfill the criteria

Return type:

list

class BioSystemPopulation(systems, representative, **kwargs)[source]

An ensemble of BioSystem. Useful to provide and analyze averaged quantities

Parameters:
  • systems (dict, list) –

    BioSystems defining the populations. They can be provided as a dict of pdb files plus logfiles, or as a dictionary of systems, or as a dictionary of archives for serialization. The dictionary is composed as follows: <label>: {‘archive’: <archive_filename>,

    ’system’: <BioSystem instance>, ‘filename’: <pdbfile>, ‘logfile’: <logfile_path>, ‘weight’: weigth of the sample in the population, ‘mapping’: provides the expression of the fragment of

    the representative in term of the fragments of the item of the population}

    alternatively, a list can be employed, in which case the filling of the population is performed sequentially

  • representative (str, int) – label of the system representative of the population

  • exclude_representative (bool) – if True, the data of the population will be evaluated without considering the representative

  • fragment_values (list) – list of the internal fragment values that can be found inside each of the system. Can be left as an empty list to delegate all the evaluation to the populations.

  • fragment_interactions (list) – list of the attributes which correspond to the quantities which are associated to inter-fragment interactions.

  • to_evaluate (dict) – dictionary of functions to be employed to evaluate at the creation of a population. They must provide a quantity that is related to a system.

  • chessboard_cutoffs (list) – list of the cutoffs that will have to be employed for the fragmentation. Use an empty list to deactivate.

  • initial_view (dict) – view of the superunits fragmentation that will be employed to the representative in the lowest cutoff

  • **kwargs – arguments to be passed at the loading of the system

serialize(archive, chessboard_dict=None, **kwargs)[source]

Dump the entire population into an archive, containing the representative archive, if it is provided as such (otherwise the serialization of the system) as well as the numpy files of all the population data.

Parameters:
  • archive (str) – archive to dump the population to

  • chessboard_dict (dict) – optional dictionary of the fragmentation of the population that has been previously found with a chessboard algorithm. If absent, the internal chessboard is employed

  • **kwargs – arguments to be passed to BioSystemSerialization instantiation in the case of a system.

classmethod load(archive, **kwargs)[source]

Create a BioSystemPopulation instance from a serialized archive.

Parameters:
  • archive (str) – the path of the archive

  • **kwargs – other arguments that have to be passed to the BioSystemPopulation instantiation

Returns:

a biosystem population instance

Return type:

BioSystemPopulation

update_fragmentation(cutoff, view_old, sys)[source]

Update the fragmentation views starting from the last element of a list of views

Parameters:
  • cutoff (float) – the cutoff to employ for the fragmentation

  • view_old (dict) – previous view of the system

  • sys (BioSystem) – the system employed to update the frament list

Returns:

new view of the system

Return type:

dict

reorder(array, of, with_respect_to, mapping=None)[source]

Reorder an array of data of systems in such a way that it can be applied to characterize another one.

Parameters:
  • array (array-like, dict) – data in the order of of.fragment_names. in case of a dictionary, it is assumed that it contains the data coming from interactions or bond orders.

  • of (BioSystem) – the system associated to the original data

  • with_respect_to (BioSystem) – the system to which reorder the data

  • mapping (dict) – dictionary of the view that decomposes the fragments of with_respect_to in terms of the fragment of of in case of multiple elements in the values, an aggregation is considered. If mapping is present, there should not be undefined fragments in their values

Returns:

data in the new order. It contains numpy.nan for the

fragments which are not present in the new system.

Return type:

numpy.array

clean_view(view_orig, superunits)[source]

Define a view of the system that comes from a view of another one. Superunits are cleaned in such a way that no inconsistencies are present in the new view.

Parameters:
  • view_orig (dict) – the original view we would like to clean

  • superunits (list) – list of the new system’s superunits

Returns:

the new view of the system, where combined superunits are

joined with the ‘+’ symbol

Return type:

dict

rename_residue(res)[source]

Rename the residue of the system such that alphabetical order corresponds to sequence order

Parameters:

res (str) – residue of the system, in form “chain-resname:pos”

Returns:

residue in the form “chain-pos:letter”

Return type:

str

Warning

Only put letters to residues which have letter associated

rename_labels(labels, as_dict=False, from_dict=False, mappable=<function rename_residue>)[source]

Rename the names of the residues of a BioSystem in order to be compatible with other naming schemes.

Parameters:
  • labels (iterable) – contains the list of residue to be renamed

  • as_dict (bool) – if True, returns a dictionary as a relabel

  • from_dict (bool) – if True assumes that the remapping has to be performed also on the labels values.

  • mappable (func) – the function that will be applied for renaming

Returns:

dictionary or list of the renamed labels, depending on

the arguments.

Return type:

dict, list